4 research outputs found

    Using TXM Platform for Research on Language Changes over Time: the Dynamics of Vocabulary and Punctuation in Russian Literary Texts

    No full text
    International audienceThe purpose of this paper is to test the methodological tools provided by TXM open-source software for research on dynamics of vocabulary and punctuation marks in diachronic corpora. TXM provides both quantitative and qualitative analysis features. It is shown that Russian revolution of 1917 did make significant changes in the core vocabulary of the corpus of Russian Short Stories (1901-1930). The same methodology may be used both for diachronic studies of literature and for various NLP tasks

    Pragmatic Markers in Russian Spoken Speech: an Experience of Systematization and Annotation for the Improvement of NLP Tasks

    No full text
    Pragmatic markers are an integral part of spontaneous spoken speech, however, they still have no systematic scientific description. These speech elements perform mostly pragmatic functions and are characterized by almost complete absence (or significant weakening) of lexical and/or grammatical meaning. The frequency of pragmatic markers in speech exceeds that of almost all content words. Because of that, for the improvement of many current NLP tasks, it is very important to obtain proper systematization of pragmatic markers and to develop effective and reliable schemes for their annotation. In current research, we describe the preliminary set of pragmatic markers categories and present the results of two stages of their pilot annotation made independently by a group of experts